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Abstract 

We study the mixing time of the Dikin walk in a polytope — a random walk based on the log-barrier from the interior 
point method literature. This walk, and a close variant, were studied by Narayanan (2016) and Kannan-Narayanan 
(2012). Bounds on its mixing time are important for algorithms for sampling and optimization over polytopes. Here, 
we provide a simple proof of their result that this random walk mixes in time 0{mn) for an n-dimensional polytope 
described using m inequalities. 
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1. Introduction 

Sampling a point from the uniform distribution on a 
polytope K C K" is an extensively-studied problem and is 
a crucial ingredient in several computational tasks involv¬ 
ing convex bodies. Towards this, typically, one sets up an 
ergodic and reversible random walk inside K whose sta¬ 
tionary distribution is uniform over K. The mixing time 
of such a walk determines its efficacy, and, in turn, de¬ 
pends on the isoperimetric constant of K with respect to 
the transition function of the walk. Starting with the in¬ 
fluential work of Dyer et al. Q , there has been a long line 
of work on faster and faster algorithms for generating an 
approximately uniform point from a convex body. More¬ 
over, since convex bodies show up in a variety of areas, 
there is a wide body of work connecting random walks and 
isoperimetry in convex bodies to several areas in mathe¬ 
matics and optimization. 

One such important connection to the interior point 
method literature was presented in the works of Kannan 
and Narayanan @ and Narayanan Q who proposed the 
Dikin walk in a polytope. Rou^ly, the uniform version of 
the Dikin walk, considered by |6|, when at a point x G K, 
computes the Dikin ellipsoid at x, and moves to a random 
point in it after a suitable Metropolis filter. The Metropo¬ 
lis step ensures that the walk is ergodic and reversible. 
The Gaussian version of the Dikin walk, considered by @ , 
picks the new point from a Gaussian distribution centered 
at X with its covariance given by the Dikin ellipsoid at x, 
and applies a suitable Metropolis filter. The Dikin ellip¬ 
soid at a point x is the ellipsoid described by the Hessian of 


* Corresponding author 

Email addresses: sachdeva@cs.yale.edu (Sushant Sachdeva), 
nisheeth.vishnoiOepfl.ch (Nisheeth K. Vishnoi) 

Preprint submitted to Elsevier 


the log-barrier function at x. It was introduced by Dikin in 
the first interior point method for linear program ming 

Several virtues of the Dikin ellipsoid (see [13, Ell 0) 
were used by [3, [3] to prove that the mixing time of the 
Dikin walk is 0{mn) starting from a warm start, when 
K is described by m inequality constraints. Recall that a 
distribution over K is said to be a warm start if its den¬ 
sity is bounded from above by a constant relative to the 
uniform distribution on K. Roughly, the proof (for either 
walk) consists of two parts: (1) an isoperimetric inequal¬ 
ity, proved by Lovasz Q, for convex bodies in terms of a 
distance introduced by Hilbert, and (2) a bound on the 
changes in the sampling distributions of the Dikin walk in 
terms of the Hilbert distance. The bound in (2) was the 
key technical contribution of [3, (3] towards establishing the 
mixing time of the Dikin walk. We present a simple proof 
of this bound for the Gaussian Dikin walk implying that 
it mixes in time 0{mn). Our proof uses well-known facts 
about Gaussians, and concentration of Gaussian polyno¬ 
mials. 

1.1. Dikin walk on Polytopes 

Suppose K C M" is a bounded polytope with a non¬ 
empty interior, described by m inequalities, ajx > bi, for 
i £ [m]. We use the notation x £ K to denote that x is in 
the interior of K. The log-barrier function for K at x £ K 
is F{x) := — T]ie[m] ~ ^0- Let H{x) denote the 

Hessian of F at x, i.e., H{x) := EzgM 
all X £ K, H (x) is a positive definite matrix, and defines 
the local norm at x, denoted IHIj,, as H'l’llj, := v^H(x)v. 
The ellipsoid {z : ||2 — < 1} is known as the Dikin 

ellipsoid at x. 

From a point x £ K, the next point z in the Dikin 
walk is sampled from the Dikin ellipsoid at x. The uni- 
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form Dikin walk, considered by @ , sampled the new point 
z from the uniform distribution in this ellipsoid. In the 
Gaussian Dikin walk, considered by @ , 2 is sampled from 
Qx , a multivariate Gaussian distribution centered at x with 
covariance matrix —H{x)~^, where r is a constant. Thus, 
the density of the distribution is given by 

gx{z) = VdetH{x) ’ exp • ||z - . 

Equivalently, the next point z is given by 
z = x+-^{H{x)) "^" 5 , 

where g is an n-dimensional vector with each coordinate of 
g sampled as an independent standard gaussian H{0, 1 ). 

In order to convert this into a random walk that stays 
inside it', with its stationary distribution as the uniform 
distribution on itT, we apply the Metropolis filter to obtain 
the transition probability density Px of the Gaussian Dikin 
walk: 'iz ^ X, \i z & K, Px{z) = mhi{gx{z),gz{x)'\ (the 
walk stays at x with the remaining probability). 


1.2. Hilbert Metric, Isoperimetry, and Mixing Time 

We introduce the distance function which plays an im¬ 
portant role in establishing the mixing time of the Dikin 
walk. Given two points x,y € K, let p,q be the end 
points of the chord in K passing through x,y, such that 
the points lie in the order p,x,y,q. We define a{x,y) := 
, where \xy\ denotes the length of the line segment 
xy. log(l -I- a{x,y)) is a metric on K, known as Hilbert 
metric. 

Lovasz proved the following theorem for any random 
walk on K: Suppose for any two initial points x,y £ K 
that are close in a distance, the statistical distance of the 
distributions after one step of the walk each from x and 
y, is bounded away from 1. Then, the lazy version of the 
random walk (where we stay at the current point with 
probability 1/2 at each step) mixes rapidly. 

Theorem 1 (Lovasz @)). Consider a reversible random 
walk in K with its stationary distribution being uniform 
on K. Suppose 3A > 0 such that for all x,y £ K with 
(^{x,y) < A, we have ||pa; — < 1 — f^(l)j where Px 

denotes the distribution after one step of the random walk 
from X. Then, after 0(A“^) steps, the lazy version of the 
walk from a warm start is within 1/4 total variation dis¬ 
tance from the uniform distribution on K. 


Kannan and Narayanan proved that the transition func¬ 
tion of the uniform Dikin walk, px, for x £ int(Ar), satisfies 


the hypothesis of the theorem above with A = H 



thus implying that it mixes in 0{mn) steps from a warm 
start. An analogous result for the Gaussian Dikin walk is 
implicit in the work of Narayanan. Our main contribution 
is an alternative and simple proof of their main technical 
contributions. In particular, we prove the following theo¬ 
rem. 


Theorem 2. Let e £ (0, 1 / 2 ]. For the Gaussian Dikin walk 
on K with r < ^(log 2 ^)-V 2 ^ points x,y £ 

K such that ||a; — y\\,^ < we have \\px “Py||^ < e. 

In order to use this theorem along with Theorem [1] to 
obtain the claimed mixing time bound, one needs a simple 
fact that, for any x,y in a, polytope K, which is described 
using m inequalities, a{x,y) > ||a; — y\\,^ . A proof of 

this fact is given in the appendix; see Lemma IHl 

The following two lemmas are the main ingredients in 
the proof of Theorem[2j (1) If two points x,y are close in 
the local norm, i.e, \\x — y\\„. < then the two Gaussian 
distributions gx and gy are close in statistical distance. ( 2 ) 

If r is small enough (as a function of e), then for all x, Px 
and gx are e-close in statistical distance. 

Lemma 3. Let r < 1, and c> 0 be such that c < min{r. Vs}- 
Let x,y £ K. If \\x - ?/|V < then \\gx - gy\\^ < 3c. 

This lemma relies on a well-known fact about the Kullback- 
Leibler divergence between two multivariate Gaussian dis¬ 
tributions, and Pinsker’s inequality that bounds the statis¬ 
tical distance between two distributions in terms of their 
divergence. 

Lemma 4. Given e £ [0, 1 / 2 ], for r < Y|g(log we 

have \\px - gx\\i <£■ 

This lemma, which shows that the Metropolis filter does 
not change the distribution much, relies on a result on 
the concentration of Gaussian polynomials, proved using 
hypercontractivity. Given the above lemmas. Theorem [5] 
follows by applying triangle inequality. 

2. Statistical distance between Gaussians and the 
local norm 

In this section, we present a proof of Lemma [3] that 
bounds the statistical distance between gx and gy for two 
points X, y that are close in the local norm. We need the 
following well-known fact about the Kullback-Leibler di¬ 
vergence between two multivariate Gaussian distributions. 

Fact 5. Let Gi = and G 2 = {^> 2 ,^ 2 ) be two 

n-dimensional Gaussian distributions. Then, 

+ (a^i - L2)^'^i ^(mi - M 2 )) , 
where Dkl denotes the Kullback-Leibler divergence 

Dkl{P\\Q) = I log^dP(x). 

In order to use this theorem, we have to bound the eigen¬ 
values of H{x)H{y)~^. For x,y that are close in the local 
norm, this follows since H{x) « H{y)- 
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Proof Lemma From the assumption, we have, 
^2 


c 

n 


> _ yf - V > max - y)f 

-"h (a7.-60^- 


Thus, for all i £ [m], we have 


1 - 


^/n 


{ajx-bi) < {ajy-bi) < 1 + ^ {ajx-bi). 


y/n 


By the definition of we get, 
2 


(^1 - H{y) < H{x) ^ (^1 + H{y). 

Thus, all eigenvalue Ai,..., A„ > 0 of H{x)H{y)~^, satisfy 


1 - 


i/ri 


< Ai < I 1 + 




We can now bound the statistical distance between = 
and gy = M [y,^H{y)-^^ by using 

Pinsker’s inequality [i|, p. 44], which gives that ||ga; — ffy ||^ < 
2 • ^KhigyWgx)- Letting Si,S 2 denote the covariance ma¬ 
trices of gi-,g 2 , we can write Tr 

log ^ = log = E”=1 log i-• 


det S2 


distribution g^ and the random walk distribution p^, ob¬ 
tained by applying the Metropolis filter to gx, is small. We 
have. 


\\Px{z)-gx{z)\\^ = l- E min|l,^^^l. (1) 

L gx{z)} 

Given e £ (0, 1 / 2 ], we show that for an appropriate choice 
of r, the above statistical distance is bounded by e. 

The ratio of gz and gx has two terms: one involving the 
ratio of detiL(a;) and detiL( 2 ;), and one involving the dif¬ 
ference in local norms || 2 : — x||^ — Ijz — a:||^. Proposition [51 
bounds the first by controlling the norm of V log det H (x). 
Proposition [7| bounds the second term by using concentra¬ 
tion of Gaussian polynomials. 

Proof of Lemma We have. 


gzjx) 

gx{z) 


= exp 


2 r 2 


Z - x\\l - \\z - xWt 


+ - (logdetiL( 2 :) — logdetiL(x)) 


From Proposition | 6 l for r < |(2log we have 

Pr[logdet iL(z) — logdet iL(x) > —e/ 2 ] > 1 — e/ 4 . 
Also, from Proposition!?] for r < ^^(log so/e)-?^^ have. 


i=l 


- II J < ^ ( A, - 1 + log — ) + ;^ llx - y|| 


n 

J .2 


Pr 




> 1 - e/ 4 . 


(Using Fact [S|) 


< 




y\\ 


2 

X 


(Using log i < i - 1 ) 


< n ■ max 


( 2 c /^_ e 2/„)2 ( 2 c /^+ e 2/^)2 



Combining the two using a union bound, we get that ex¬ 
cept with probability e/ 2 , we have, — 1 — 


Thus, 


E min 

z~gx 


1 ^ 1 > 

^gx{z)j- 


> 


-I 


Pr 


9z{x) ^ ^ £ 



1 — e. 


The claim now follows from O- 


□ 


(Using the convexity of A -I- — 2) 


< n 


( 2 c /^_. 2/^)2 ^2 

(l_c /^)2 +^2 


^2 -2 25 


n (2 — ZiO 9 9 .9 

= + ^ < < 9c2, 

(1 - r2 4 

where the last line uses c < < 1 and n > 1 . 


□ 


3. The effect of the Metropolis filter 


Proposition 6. Given e £ ( 0 , 1 / 2 ], for r < . ^ , and 

V 2 log 

z ^ gx we have 

Pr [log det H (z) — log det H{x) > —2e]>l — e. 

Proof of Proposition [6} Let V(x) := pogdetiL(x). 
From the work of Vaidya [I^, we know that V{x) is a 
convex function. Thus, V{z) — V{x) > (z —x)^VU(x). We 
know that z = x + -^ [H{x)) g, where g ^ A/'(0,I„). 
Thus, 

V{z)-V{x)>^g^ (i?(x))“'^^VU(x). 

Vn 


In this section, we prove Lemma |4] that shows that for 
any x £ AT, the statistical distance between the Gaussian 


3 





















W[x) is a Gaussian with mean 0 and vari¬ 


ance 


{H{x)) "^"VV{x) 


. From Lemma 4.3 in the work 


of Vaidya and Atkinson 

{H{x))-"'^ 


13l |. it follows that 
2 


VV{x) 


< n. 


Now, we let ^ 2 ( 5 ) — 9)*- Again, from FactITOl we 

know that 'Eg P 2 {g)^ < 105n^, and applying Theorem [51 

we obtain that for A 2 = ^max {2e, ^ log and r < 

■■. we obtain, Pr [ 1 / 2 ( 5 )! ^ — §■ Thus, with 

probability at least 1 — f ■ 


Using standard tail bounds, we get that for all A > 0, 
Pr[( 7 ^ (^H{x)) ^^WV{x) > —Xy/n] > 1 — exp(—'^ 72 )- 


Picking A = A/21og i/e, and combining, we get, Pr[V'(z) — 
V(x) > —r-v/21og i/eI > 1 — e. For r < , , we have 

rY^2 log i/e < e, which gives the claim. □ 


4 m 




e € r 

2 8r2 ^ 8 n 


Note that this also implies that for all i, -^\ajg\ < 
/ 2x7^1 , 

( ^ 1 < f, where the last inequality holds for all r < 1. 

Thus, with probability at least 1 — we have 


Proposition 7. Given e S (0, 1 / 2 ], for r < ^(log n/^) ^ i 
and z ^ gx, we have, 




2 = 1 


+ [l + ^djg) 


/— ^ 
Vn 


Pr 




> 1 -e. 




<8-^(a gy 

2 = 1 



Proof PropositionjT) We have z = x-\--^ {H{xf) ^ g, 

_ 1 

where g ^ A/’(0, I„). If we let di = ^ 1 [H{x)) ^ a^, we 

get ajiz-x) = -^{ajX - bi) ■ djg, and E™ 1 = I"- 


- a;||? - jU - xWl 


= {z - x)y 
2=1 


{ajz-biy {ajx-biy 


n — ■ ■ - \ (1 +^ 0 ^ 5)2 


= -Y.^aJg)'- 

n 

2 = 1 
4 m 


1 


- 1 


2 


i=i 

2r^ 


(i + :k«75) + 


n 


3/2 


9? ■ 


( 2 ) 


2=1 


Combining this with Equations m and @, and applying 
a union bound, we get that with probability at least 1 — e 


N-a;||^- || 2 ;-x||^ < 2 £- ^ 


Finally, we verify that for e G (0, 1 / 2 ], any r < ^(log n/e) 2 
satisfies the conditions 


r < min < 1 


V~e 1 


’ 2 yT 5 Ai ’ V8A2VIO5 j 


□ 


Theorem 8. (see Janson Thm 6.7]) Let P[g) be a 
degree q polynomial, where g G M" such that g ~ ^(0, I„). 
Then, for any t > x/Se , we have. 


Pr 

a 


|P(g)| >i(EP(gf)‘'“ 




We now use concentration of Gaussian polynomials (see 
Theorem |5|) to bound the two terms above. Let Pi{g) := 
jyy^iidjgy. From Fact HUl we know Eg Pi( 5 )^ < 15n. 
Thus, using TheoremUl we know that for any Ai > (x/^)^. 


Pr 

a 


\Pii9)\ > Aix/T^ 


< exp 
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/ \ 

Picking Ai = (max {2e, ^ log f } j , and r < , we 

obtain, Pr [|Pi( 5 )| > ^x/^] ^ f ■ Thus, with probability 
at least 1 — f, 


2r3 


i=l 


2r^ 

~ yyp 




(3) 
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Appendix B. Moments of Gaussian Polynomials 

Fact 10. Suppose g € M" is distributed according toJ\f{0, In), 
and YllLi = In- Then, we have, 

( 6 U)U <105n2- 


E 




{hjgf 


< 15n, and E 


Proof of Fact 1101 We first consider the first part of the 
fact. From Fact m we know that for all i,j, 

Eibjgfibjgf = 9 || 6 ,f \\b£ {bjb,) + 6ibJb,f. 


Summing over all j, we get, 

2 

m 

= nbjgfibjgf 

m 

= 9 E \Ml\\bj\\l{bJbj) + 6j2(^^^^)"- (B.l) 

*j=i 


E 


Y.^bjgf 

, i—1 


This equality can also be derived using Isserlis’ theorem 
(i). If we let B be the m x n matrix with its i*’*' row 
being bj, and w G M"* be such that Wi = || 6 i|l 2 , 
simplify the first term in the above sum as follows. 




*.i=i 


E 

2=1 




B^w 


2 

2 


Appendix A. Relating the local metric to the Hilbert Using bibj — I„, we get B^B — I„. Thus, the m x 

m matrix 11 := BB^ satisfies 11^ = 11. Since 11 is also 


Lemma 9 


metric 

). For any x,y G K, we have a{x,y) > 




■-y\\x- 


Proof Lemma [9} Let p, x, y, q be the points in order 
on the chord of K that passes through x, y with p, q being 
the end-points of the chord. Thus, 


\x-y\\p-q\ 

(J[x,y) = - -r-r > max 

\p-x\\q - y\ 


’-y\ \x-y\ 


= max 


\p-x\' \x-q\ 
al{x-y)\ 


ie[m] (ajX — bi) 




{aj{x-y)y 


V 2 €[m] 


{ajx-bi)"^ 




1/2 


symmetric, it is an orthogonal projection. Thus, we have 
||nu ;||2 < ||rc ||2 • We obtain. 


B'w 


= BB^w = w^IIiu 


= w^U^w 


= ||nHl2<lkll2 = E 11^*11'- 
2=1 

Since J^^ibibJ = In, we get that for all i, || 6 i|| < 1. 
Moreover, taking trace, we obtain X)™ ^ Thus, 

m m 

J2\\b.f<J2\\b.f=n. 


Thus, we can bound the first term in Equation (jB.ll) by n. 

For the second term in Equation dHl]), using WbiW^ < 1 
for all i, and Cauchy-Schwarz, we get \bjbj\ < 1 for all 
□ *,j. Using = In, we also know that for all j, 

T,T=iibJbj)^ = \\bj\\l. Thus, we get. 


E(C 6 .)^ = EII^/IE 

i,j=l ij'=l i=l 
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Combining the bounds for the two terms in Equation (EU, 
we get the first part of the fact. 

For the second part of the fact, we use Cauchy-Schwarz 
inequality, 


Fact 11 (Isserlis jj]). Suppose g € R" is distributed ac¬ 
cording to A/'(0, I„), and 6 i, 62 o,re any two vectors in R", 

nbjgfibjgf = 9||6if ||& 2 f {bjh^) + (!>{bjh^f. 

9 


E = E nbl9)\b]gf 

\i=i / ^,j=l 

JIL / \ 1/2 / \ V2 

< E (e(& 7 s)®) (e(&J 5 )®) . 

*.i=i 

We have that bjg is distributed as a Gaussian with mean 
0 and variance \\bi\\‘^ . Thus, E^bJg)^ = 105 • Hence, 

we get. 


Proof of Fact lilt We define bi := • bi to be the 

corresponding unit vectors. Thus, 

Eibjgfibjgf = |16if \\b,fEibJgf{bJgf. (B.2) 

We let ei,..., Cn denote the standard basis vectors for R”, 
i.e., Ci is 1 in the i**' coordinate and 0 elsewhere. Since the 
distribution of g is rotationally symmetric, we can assume 
that bi = Cl, and 62 = cos 0 • ei + sin 0 - 62 , where 9 is such 
that cosO = bjb 2 . Thus, 


E <105E Il^*ll2||^, 


4|Iv||4 

JII 2 


, i=l 


i,j=l 


= 105 I^E ) < 105n^ 

proving the second part of the fact. 


□ 


'E'ibJgfibjgf = Egl {cos 9 ■ gi +sin6» -32)^ 

= cos^ 0 E + 0 + 3 cos 9 sin^ 0 E E gf + 0 
= 15 cos^ 0 + 9 cos 9 sin^ 0 = 9 cos 0 + 6 cos^ 0 

= 9(i+)+6(i+) = . 

Combining with Equation (EH), we obtain the fact. □ 


Note: The bounds given by the above fact are tight for the 
case where bi form an orthonormal basis. 
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